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Abstract 


2.  Mathematical  Description  of  the  Problem 


Monte  Carlo  analysis  of  the  combined  phase 
amplitude  demodulator  problem  will  be  discussed. 
The  effect  of  machine  architectural  differences 
on  the  nonlinear  filtering  algorithms  will  be 
emphasized.  Two  machines  will  be  considered; 
the  CDC  Star  100  and  the  Floating  Point  System’s 
AP1203,  which  are  respectively  pipeline  and  multi- 
processor array,  in  the  context  of  numerical 
realization  of  the  optimal  nonlinear  filter  for 
the  demodulation  problem. 


1.  Introduction 

The  area  of  nonlinear  filtering  has  been 
quite  active  in  the  past  ten  years  with  numerous 
papers  on  subjects  such  as  innovation*,  filtering 
on  Lie  Groups  and  a wide  variety  of  subopt ima l 
designs.  Unfortunately,  most  of  this  work  has 
had  little  or  no  effect  on  the  two  fundament  i! 
questions  which  must  be  solved  before  the  theory 
can  be  applied.  Namely,  how  does  one  build  i 
real-time  nonlinear  filter  and  determine  the 
error  performance  of  the  optimal  filter. 

Recently  in  a thesis  at  M.  .T.,  dados  has  made 
progress  on  the  second  problem  as  well  as  showing 
the  connection  between  Information  Theoretic 
Ideas  and  Filtering  (see  [l]).  For  some  time  now 
we  have  attempted  to  actually  build  nonlinear 
filters  using  digital  computers  (see  (2]  - (51). 
Originally,  for  the  two-dimensional  phase 
demodulation  problem  our  object  was  to  build  the 
filter  in  order  to  do  Monte  Carlo  error  analysis, 
and  real-time  filter  operation  was  not  considered. 
This  was  because  the  synthesis  tool,  the  CDC  6600, 
was  large,  slow  and  expensive.  Now,  using  the 
AP-1203  processor,  real-time  operation  i9 
feasible.  Of  course,  the  Star  100  is  2.5  times 
faster  than  the  array  processor  and,  with  its 
large  memory,  is  an  ideal  tool  for  Monte  Carlo 
analysis . 


Let  ?n  •»  {Pn>  be  the  conditional  density 
of  the  phase,  Xn,  phase  rate,  Ya,  and  amplitude, 
An  at  time  given  the  observation  z. 


O '* 


whe  re 


*n(l)  - h (An) cos  xn  +Mn 
zn(2)  ‘ h<An>sln  x_,  +v* 


(2.1) 
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with  v independent  Gaussian  white  noises  of 
rpaclal  density  r.  It  can  be  shewn  that 

- 

Pn+I(x.y,A)  - J j a/A— i)a,(y-.;)?n(x-ir,r,1) 
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(2.2) 


K (x, y, A)  ” -xo(  y (h(A)z  cos  x + h(A)z_sla  x 
11  *~7T"“  r n a 

X 

- -j  h2(Al)  ?a(x,y,A) 

K is  chosen  so  that  Fn  is  a density  a repre- 
sents the  discrete  time  step,  and  a.  is  periodic 
of  period  2"/A.  Note  that  F and  ? are 
periodic  in  their  first  and  second  arguments;  of 
period  2n  and  2t/&  respectively.  An,  xn,  yn  are 
Gauss-Markov  processes — see  [2J  for  details  of  the 
model,  ler  the  Monte  Carlo  runs  on  the  Star  100, 
h was  taken  to  be  linear,  a more  physically 
interesting  : odel  would  expodential,  as  suggested 
by  Dr.  A.  J.  Mallinckrodt . We  represent  the 
densities  :’n  and  Fn  as  weights  over  a moving 
cube  cf  lattice  points  in  ?.?,  for  fixed  amplitude 
the  cross-section  of  the  cubic  lattice  is  a 
unifoim  grid  with  16  subdivisions  in  phase  and  96 
in  phase-rate.  Now  the  fixed  amplitude  cross- 
sections  are  16  in  all  and  the  configuration  is 
centered  at  the  best  current  estimate  of 
amplitude  with  distances  between  cross-sections 
proportional  to  square  root  of  the  mean  square 
error  in  estimating  the  amplitude. 
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3. 


Software  Structure 


FIGURE  I 

Notice  that  the  grid  structure  mov^s  only  in  the 
A direction.  There  are  24,576  grid  points  in  all. 
Notice  that  ( 2.2 ) can  be  viewed  as  doing  16 
computations  of  the  two-dimensional  phase 
demodulation  problem,  followed  by  a convolution 
over  amplitude — see  [3],  [24]  - [26].  This 
observation  Is  the  basis  of  the  software 
realization  for  both  the  Star  100  and  the 
Floating  Point  Processor.  In  order  to  explain 
theglobal  structure  of  both  the  software  for  the 
Star  100  and  the  Array  Processor,  it  will  be 
useful  to  replace  (2.2)  by  a series  of  simpler 
operations  as; 

scrambling:  Fn(x,y,A)-*  Fn(x-Ay,y,A)  (2.3) 

(Hotice  that  as  F is  periodic, nodular  arithmetic 
is  involved.) 

H 

interpolation:  Fn(xi”Yj^»yj  *Afc)  **  aFn^xi,  j *^j 

+(l-a)Fn(x|f j ,yj ,Ak)  (2.4) 

H f L , 

where  x^j  {x^}  the  B^ld  point  directly 
above  {below}  the  value  x^-yjA; 

1*1.1  - fxt-Ay  ))  1 


phase  convolution: 

where  F1  is  the  image  under  the  interpolation 
nap . 

C r c 

amplitude  convolution:  Fn  -►  l a^  ( • , 9.) Ffl (x^ ,yj  , 1) 

(2.6) 

new  data:  The  second  napping  in  (2.2)  with  K * 1. 
norming:  Finding  the  sum  of  the  weights,  K . 

nuroa lizat ion:  Division  hy  K. 
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(•,yJ>Fn(x£,yj,A.c) 


Since  the  memory  bandwidth  of  Star  100  is 
extremely  high  as  long  as  a large  number  of  con- 
tiguous memory  locations  are  accessed  sequentially, 
one  can  neglect  memory  accesses  and  trade  storage 
for  computation  speed.  By  periodically  extending 
the  censors  representing  F and  P,  the  need  for 
doing  nodular  arithmetic  can  be  eliminated.  The 
detailed  code  can  be  obtained  through  ICA5E, 
attention  Dr.  Bob  Voight,  NASA  Langley,  Hampton, 
Virginia.  The  original  code  was  written  by 
Dr.  H.  Youssef  of  Lockheed  Aircraft,  Burbank, 
following  the  2D  phase  lockloop  code,  written  by 
Dr.  K.  D.  Senne.  Youssef *s  code  was  debugged  and 
corrected  by  the  author.  Fn(x,y,A)  is  carried  by 
Star  as  a vector  whose  i+  (M) ( j-l)+MN(k-l)  19 
the  weight  associated  with  the  i th  phase  grid  point 
and  jth  phase  rate  grid  point  and  kth  amplitude 
grid  point.  The  structure  of  the  program  is  given 
in  Figure  2 below. 


Normalization 
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FIGURE  2 - STAR  CODE  FIGURE  3 - AP-1203  CODE 

The  unusual  point  in  the  Star  Code  was  view- 
ing the  phase  convolution  as  the  sum  constants 
(i.e.,  values  of  the  convolution  kernel)  times  the 
density.  This  viewpoint  overcomes  the  need  to 
reference  memory  with  fixed  phase,  which  would 
lead  to  excessive  timing  on  Star  because  non- 
contiguous memory  references  would  be  necessary, 
unless  the  memory  was  rearranged.  Rearrangement 
is  also  time-consuming.  For  Star,  the  time 
necessary  to  accomplish  one  iteration  of  the  loop 
pictured  in  Figure  2 is  239  milliseconds  for  the 
3-dimen3ional  problem  and  5 milliseconds  for  the 
2-dimonsional  problem — see  [5]. 

The  assembly  language  code  for  the  AP-1203 
was  arranged  in  order  to  minimize  the  number  of 
reads  and  writes  from  memory,  as  memory  band- 
width is  critical.  The  operations  were  arranged 


in  two  groups;  phase  rate  fixed,  scrambling  and 
interpolation  and  phase  fixed  convolution.  In 
each  group  a read  and  write  are  necessary  for 
lightly  more  words  than  the  number  of  grid  points. 
Notice  that  the  loop  cycles  on  the  one-step 
predictor,  Pn,  rather  than  the  fitter,  Fn. 

A significant  feature  of  the  AP120B  code  is  the 
parsimonious  use  of  memory,  which  is  a must  for 
effective  real-time  synthesis.  In  fact.  Star 
Code  requires  3.5L*  versus  1.2L  memory  locations 
for  the  AP1203  Code.  The  Star  Code  runs  2.61 
times  faster  than  the  AP120B  Code,  which  is  quite 
close  to  the  ratio  of  add  plus  multiply  times  of 
the  two  machines,  which  is  2.78.  The  structure 
of  the  AP120B  loop  was  suggested  by  Dr.  K.  D.  Senne 
of  M.I.T.  Lincoln  Laboratory  and  the  coding  of  the 
convolution  loop  by  Dr.  A.  J.  Mallinckrodt  of 
the  Communications  Research  Laboratory,  and  the 
interpolation  and  scrambling  loop  by 
F.  Ghovanlou  of  U.S.C. 


A.  Monte  Carlo  Programs 

We  developed  a restartable  Monte  Carlo 
program  for  the  Star  100.  Each  Monte  Carlo  run 
consisted  of  130  consecutive  estimates  in  time. 

Our  program  allowed  us  to  do  only  a selected 
number  of  Monte  Carlo  runs  at  a time  with  a file 
written  to  allow  us  to  continue  later  where  we 
had  left  off,  i.e.,  the  state  of  the  random 
number  is  stored.  In  this  wav  at  each  signal- 
to-noise  ratio  the  requisite  200  Monte  Carlo 
runs  could  be  accomplished  in  pieces  of  AO  runs 
which  would  take  around  30  minutes  of  Star  100 
CPU  time.  When  h(A)  - A and  the  kernel 
aj(A,Z)  approaches  a delta  function,  it  is  clear 
that  (2.2)  becomes  closer  and  closer  to  the  up- 
date for  the  2-dimensional  phase  demodulat ion 
problem  studied  In  [3].  Using  this  fact,  the 
3-dimensional  problem  was  debugged  by  comparing 
the  results  to  the  2-dimer.sional  situation. 

When  h(A)  - A and  a^  is  Gaussian  with 
variance  q3  » .1  mean  1,  we  have  done  Monte 
Carlo  runs  at  output  sigr.al-to-noise  ration 
R • -3  and  -1.8  db.  The  resulting  error 
variances  were  extremely  close  to  those  whLch  we 
found  In  (3)  for  the  corresponding  2-dimensional 
problem.  In  the  future  we  intend  to  investigate 
the  problem  where  h(A)  » by  Monte  Carlo 
analysis,  as  this  may  be  a T.ore  realistic  model 
of  the  real  world  problem. 

The  Floating  Point  System's  AP1203  is 
connected  to  a PDP11-55  computer  with  the  multi- 
user operating  system  PJ<11M  version  3.  At  the 
mo Dent  the  Floating  Point  Processor  has  16K  word 
nemory;  in  the  near  future  we  will  expand  the 
memory  to  64K  words  which  will  allow  us  to  run  the 
problem  described  here.  In  the  Interim,  a back- 
ground task  has  been  installed  which  is  a re- 
startable Monte  Carlo  task  for  the  2-dimensional 
phase  demodulation  problem — see  [ 3 J . This  task 
has  allowed  the  evaluation  of  phase  error 


statistics  for  extremely  large  numbers  of  Monte 
Carlo  runs,  30,000,  at  various  output  signal-to- 
noise  ratios.  Each  run  then  computes  the  mean 
square  phase  estimation  on  the  basis  cf 
3,900,000  estimates.  These  runs  produce  a one- 
sigma  confidence  interval  of  length  .008  db  and 
allow  very  precise  statements  concerning  the  db- 
improvement  of  the  optimal  phase  demodulator  over 
the  traditional  phase-lock  design.  Actual 
estimate  production  time  is  due  502  to  the  AP1203 
and  50%  to  the  overhead  of  communications  between 
the  AP1203  and  the  11-55  and  PDP11  tasks.  The 
only  Casks  currently  done  in  the  11-55  are 
generation  of  observatior.3  and  signal  process, 
Monte  Carlo  statistics  and  evaluation  of  the 
sensor  (i.e.,  evaluation  of  the  expodential  in  the 
second  formula  of  (2.2)).  The  large  contribution 
of  the  PDP-11  to  the  estimate  time  in  view  of  the 
limited  tasks  it  performs  is  disturbing,  and  we 
are  currently  looking  into  where  the  time  is  used 
by  timing  various  parts  of  the  driver  program. 
Because  of  its  50-100  to  one  floating  point 
computation  speed  advantage  in  either  a real-time 
or  a simulation  environment,  the  AP120B  should  be 
assigned  all  the  floating  point  computation  tasks. 
The  system  programing  described  in  this  section 
was  done  by  Tom  Bleakney  of  U.S.C. 


5.  Real  Time  Capability 

The  Floating  Point  Processor  AP1203, 
because  of  its  site,  speed  and  cost,  makes 
possible  now  a real-time  nonlinear  phase  demodula- 
tor when  used  in  conjunction  with  the  PDP11-55. 
With  direct  memory  access,  the  observations  are 
sent  to  the  PDP11-55  and  the  observations  are  in- 
put ed  to  the  AP120B  while  it  is  computing  the 
previous  density  (i.e.,  the  update  of  the  density 
is  overlapped  with  the  data  acquisition  and  the 
estimate  acquisition).  It  appears  that  the  data 
rate  could  be  100  per  second.  Even  higher  data 
rates  could  be  achieved  in  the  rear  future  with 
improved  hardware  now  available  in  the  same  multi- 
processor array  configuration. 


6.  Conclusions 

We  have  attempted  to  demonstrate  how  soft- 
ware was  developed  for  nonlinear  filter 
realisation  which  took  advantage  of  the  machine 
architecture  of  the  Star  ICO  and  the  AP120B  array 
processor.  We  feel  that  the  revolutions  in 
machine  design,  speed  and  site  have  made  real-time 
nonlinear  filter  construction  possible.  In  the 
near  future  it  is  clearly  feasible  to  design  real 
systems  whose  operation  is  governed  by  nonlinear 
filters. 


L represents  the  number  of  grid  points  for  the  density  representation. 
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